Skip to content

[WIP] Show SASS diffs in codediff report#4264

Draft
jacobhinkle wants to merge 7 commits intomainfrom
jh/codediff_sass
Draft

[WIP] Show SASS diffs in codediff report#4264
jacobhinkle wants to merge 7 commits intomainfrom
jh/codediff_sass

Conversation

@jacobhinkle
Copy link
Collaborator

This is almost working but there are problems with the javascript that need to be debugged.

@github-actions
Copy link

Description

  • Added support for dumping SASS to file

  • Updated compiled_kernel.cpp to handle SASS dumping

  • Modified codediff.py to include SASS in parsing and diffing

  • Enhanced compare_codegen.sh and run_command.sh to handle SASS files


Changes walkthrough 📝

Relevant files
Enhancement
9 files
options.cpp
Added `SassToFile` option                                                               
+1/-0     
compiled_kernel.cpp
Added SASS dumping to compiled kernel                                       
+7/-0     
codediff.py
Included SASS in kernel parsing and diffing                           
+101/-15
compare_codegen.sh
Updated to handle SASS files                                                         
+3/-3     
run_command.sh
Updated to handle SASS files                                                         
+8/-6     
test_diff_section.html
Added SASS buttons and sections                                                   
+13/-2   
inline_javascript.js
Updated functions to handle SASS                                                 
+243/-191
options.h
Added `SassToFile` option                                                               
+3/-2     
executor_utils.h
Added SASS fields to `CudaExecutable`                                       
+2/-0     

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 No relevant tests
⚡ Recommended focus areas for review

Possible Issue

The disassembleBinary function is called with a fixed argument "-fun 1 -c". This might not be the correct or optimal argument for all cases. It should be validated that this argument works as expected and consider making it configurable.

std::string sass_str =
    disassembleBinary(compiled_kernel->cubin, "-fun 1 -c");
Code Duplication

The toggleOldCode, toggleNewCode, and toggleDiff functions in inline_javascript.js have a lot of repeated code. This can be refactored to reduce duplication and improve maintainability.

                ["git", "branch", "--quiet", "--color=never", self.full_hash],
                capture_output=True,
            )
            .stdout.strip()
            .decode("utf-8")
            .splitlines()
        ):
            # Possible output:
            #
            #     main
            #     * scalar_seg_edges
            #
            # In this case, we have checked out the HEAD of the
            # scalar_seg_edges branch. Here we just strip the *.
            if line[0] == "*":
                line = line[2:]
                in_branches.append(line)

        def git_show(fmt) -> str:
            return (
                subprocess.run(
                    [
                        "git",
                        "show",
                        "--no-patch",
                        f"--format={fmt}",
                        self.full_hash,
                    ],
                    capture_output=True,
                )
                .stdout.strip()
                .decode("utf-8")
            )

        self.title = git_show("%s")
        self.author_name = git_show("%an")
        self.author_email = git_show("%ae")
        self.author_time = git_show("%ad")
        self.commit_time = git_show("%cd")


@dataclass_json
@dataclass
class LaunchParams:
    blockDim: tuple[int]
    gridDim: tuple[int]
    dynamic_smem_bytes: int


@dataclass_json
@dataclass
class CompiledKernel:
    filename: str
    cuda: str | None = None
    ptx: str | None = None
    sass: str | None = None
    ptxas_info: str | None = None
    launch_params_str: str | None = None
    launch_params: LaunchParams | None = None
    gmem_bytes: int = 0
    smem_bytes: int = 0
    cmem_bank_bytes: list[int] | None = None
    registers: int | None = None
    stack_frame_bytes: int = 0
    spill_store_bytes: int = 0
    spill_load_bytes: int = 0
    mangled_name: str | None = None
    arch: str | None = None
    index_type: str | None = None

    def __post_init__(self):
        self.parse_ptxas()
        self.parse_launch_params()

    def parse_ptxas(self):
        # Example input:
        #
        #   ptxas info    : 307 bytes gmem
        #   ptxas info    : Compiling entry function
        #   '_ZN76_GLOBAL__N__00000000_37___tmp_kernel_pointwise_f0_c1_r0_g0_cu_8995cef2_3255329nvfuser_pointwise_f0_c1_r0_g0ENS_6TensorIfLi2ELi2EEES1_S1_'
        #   for 'sm_86'
        #   ptxas info    : Function properties for
        #   _ZN76_GLOBAL__N__00000000_37___tmp_kernel_pointwise_f0_c1_r0_g0_cu_8995cef2_3255329nvfuser_pointwise_f0_c1_r0_g0ENS_6TensorIfLi2ELi2EEES1_S1_
        #   ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
        #   ptxas info    : Used 203 registers, 16 bytes smem, 472 bytes cmem[0], 8 bytes cmem[2]
        #
        # Here we parse this into the fields presented, and we replace the
        # mangled kernel name since it includes the kernel number and is
        # useless for the purposes of diffing since the kernel signature is
        # already included.
        if self.ptxas_info is None:
            return

        m = re.search(r"Compiling entry function '(.*)' for '(.*)'", self.ptxas_info)
        if m is not None:
            self.mangled_name, self.arch = m.groups()

        def find_unique_int(pattern) -> int | None:
            assert self.ptxas_info is not None
            m = re.search(pattern, self.ptxas_info)
            return 0 if m is None else int(m.groups()[0])

        self.stack_frame_bytes = find_unique_int(r"(\d+) bytes stack frame")
        self.spill_store_bytes = find_unique_int(r"(\d+) bytes spill stores")
        self.spill_load_bytes = find_unique_int(r"(\d+) bytes spill loads")
        self.registers = find_unique_int(r"(\d+) registers")
        self.gmem_bytes = find_unique_int(r"(\d+) bytes gmem")
        self.smem_bytes = find_unique_int(r"(\d+) bytes smem")

        self.cmem_bank_bytes = []
        cmem_banks = 0
        for m in re.finditer(r"(\d+) bytes cmem\[(\d+)\]", self.ptxas_info):
            nbytes_str, bank_str = m.groups()
            bank = int(bank_str)
            if len(self.cmem_bank_bytes) <= bank:
                self.cmem_bank_bytes += [0] * (bank + 1 - len(self.cmem_bank_bytes))
            self.cmem_bank_bytes[bank] = int(nbytes_str)
            cmem_banks += 1

    def parse_launch_params(self):
        # If NVFUSER_DUMP=launch_param is given we will get a line like this for every launch:
        #   Launch Parameters: BlockDim.x = 32, BlockDim.y = 2, BlockDim.z = 2, GridDim.x = 8, GridDim.y = 8, GridDim.z = -1, Smem Size = 49152
        # This is not done by default since we might have hundreds of thousands of these lines.
        # Still, if we recognize it, we will parse this info. If there are
        # multiple lines, we just check that they are all equal and if not then
        # we keep the first version and print a warning.
        if self.launch_params_str is None:
            return

        for line in self.launch_params_str.splitlines():
Potential Error

The sanitize_sass_lines function uses a complex regex to demangle kernel names. This regex might not cover all possible cases and could fail for certain kernel names. It should be tested thoroughly to ensure it handles all expected inputs.

"""Remove comments and remove kernel id"""
sanitary_lines = []
for l in lines:
    # Replace mangled kernel names like
    #   _ZN76_GLOBAL__N__00000000_37___tmp_kernel_pointwise_f0_c1_r0_g0_cu_8995cef2_3255329nvfuser_pointwise_f0_c1_r0_g0ENS_6TensorIfLi2ELi2EEES1_S1_
    # or
    #   _ZN76_GLOBAL__N__00000000_37___tmp_kernel_4_cu_8995cef2_3255329nvfuser_4ENS_6TensorIfLi2ELi2EEES1_S1_
    # or
    #   _ZN57_GLOBAL__N__00000000_18___tmp_nvfuser_5_cu_badbb5a6_975149nvfuser_5ENS_6TensorINS_6__halfELi3ELi3EEES2_NS_9TensorMapES3_NS0_IS1_Li2ELi2EEE,(.L_x_28 - _ZN11kernelscope6kernelENS_6TensorINS_6__halfELi3ELi3EEES2_NS_9TensorMapES3_NS0_IS1_Li2ELi2EEE)

    # with
    #   _ZN11kernelscope6kernelENS_6TensorIfLi2ELi2EEES1_S1_

    # demangle first two parts after _ZN and replace with "kernelscope" and "kernel"
    m = re.match(r"^(?P<prefix>^.*\b_Z?ZN)(?P<scopenamelen>\d+)_", l)
    if m is not None:
        d = m.groupdict()
        scopenamelen = int(d["scopenamelen"])
        # demangle second part in remainder after scope name
        remainder = l[(len(d["prefix"]) + len(d["scopenamelen"]) + scopenamelen) :]
        mrem = re.match(r"^(?P<varnamelen>\d+)", remainder)
        if mrem is not None:
            drem = mrem.groupdict()
            varnamelen = int(drem["varnamelen"])
            remainder = (
                "6kernel" + remainder[len(drem["varnamelen"]) + varnamelen :]
            )
        l = d["prefix"] + "11kernelscope" + remainder

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant